A crash course in image processing

If you're having trouble designing your video algorithm, here's a few tips.

Range of a pixel

VirtualDub's 32-bit ARGB pixel format consists of three channels, each from 0 to 255.  You must make sure that if you add and subtract from pixel values, that you clip the values to [0,255] before you construct a pixel with them:

if (r < 0) r=0; else if (r > 255) r=255;
if (g < 0) g=0; else if (g > 255) g=255;
if (b < 0) b=0; else if (b > 255) b=255;

dst[x] = (r<<16) + (g<<8) + b;

This generally means that if you need to do any arithmetic on a pixel, such as averaging several together, you will need to either unpack the pixels into triplets to get more range or mask off bits to reduce precision.  The latter generally leads to poor results, so you will want to unpack pixels.  In most cases 16-bit provides enough range, and it is especially convenient in MMX assembly.

Note that if you are adapting an algorithm that works in continuous pixel values in [0,1], you should divide by 255, not 256, to do the conversion!

Convolution filters

Convolution filters are really simple: if you want to compute the output at a pixel x, grab a bunch of pixels around x, multiply each by a coefficient, and add the result together:

              center
    255   170    63    86    21    original pixels....
   1/16  4/16  6/16  4/16  1/16    each multiplied by a coefficient...
   ----------------------------
   15.9  42.5  23.6  21.5   1.3  ->  104.8

                           sum the results together
                              to get the result.
                            

Do this repeatedly, grabbing five source pixels around the output pixel and producing one output pixel.  This kind of filter is called a finite impulse response (FIR) filter, and has the following properties:

Most of VirtualDub's internal filters are convolution filters in one way or another; the difference is just that each is optimized for the specific filters used.

Thresholding

A threshold is simply a true/false comparison against a particular value:

if (pixel_value < threshold) do_stuff();

Thresholds are usually used to partition signal components apart, whether it be noise vs. video, edge vs. background, or whatever, and turn the video into a binary image for input into another part of your filter.

The purpose of this section is not so much to teach you how thresholds work (they're too easy for that), but to warn you of them: thresholds are often wonderful noise amplifiers.  The reason is that the slightest bit of noise, no matter how small, may cause a signal to bounce back and forth across the threshold.  If you are not prepared for this, it's possible that this infinitesimal amount of noise can be drastically magnified by your filtering algorithm.  You can often see this in the ragged edges of an image that has been converted to black and white.

What can be done about this?  You can denoise your threshold map, with medians or other despeckling algorithms.  You can also replace your threshold with a more continuous curve, that causes your algorithm to degrade gracefully as the signal crosses the threshold.  In general, consider replacing "smart" algorithms that use binary decisions with "dumb" algorithms which use continuous formulas.  The dumb formulas may produce worse results in some cases, but tend to be more consistent in their output.

Luminance vs. chrominance

Roughly speaking, luminance is the "brightness" of an image, while chroma is the "color." The luminance, or luma, is commonly decoded as follows:

y = 0.299*r + 0.587*g + 0.114*b

You may see other equations with different coefficients, each of which are valid for different circumstances.  There are various ways of computing chrominance as well into two additional values -- U/V, Cr/Cb, I/Q, or various other systems -- but a simpler way to do it if you're not planning to store the chroma for very long is just to compute R-Y, G-Y, and B-Y.

The point?  The human eye is much less sensitive to detail in chroma than in luma, and in addition, due to the way composite video is encoded, the chroma tends to get hit worse by noise than luma.  Noise reduction algorithms can thus benefit from attacking the chroma more aggressively than the luma.  In cases where chroma noise is especially serious (VHS tape comes to mind), working in luma/chroma can give significantly better results than working in RGB mode, although more complex.

Formally, the best way to work in luma/chroma is to use the conversion formulas to shift into a standard YCC color space, such as YCrCb, operate on the resultant bitmap, and then convert back.  However, since the transforms are linear, if your algorithm is also linear you can omit the chroma transform as noted above and simply work in (Y, R-Y, G-Y, B-Y) space instead.  Most noise reducers that rely on blurring to reduce noise can be adapted to this.  The TV filter in VirtualDub does this for fast chroma averaging.

[up] back to main page


VirtualDub external filter SDK 1.05©1999-2001 Avery Lee <phaeron@virtualdub.org>